In terms of SEO software development, one of the most expensive and time-consuming processes is data gathering. It remains a daunting task both at the stage of assembling the product and during the whole time of its full functioning. Whether you are planning to focus on keyword research, on-page SEO, SERP rank tracking, or else, you will need to collect the necessary data.
Getting SEO data: DIY approach
The first solution is DIY. However, dealing with web scraping in-house may sound more simple than it is. Data scraping comprises of several intricate components:
- Web crawling and parsing
- Structuring the data and rewriting it into a convenient format (e.g., CSV, XML)
- Storing the data in a database
Apart from that, you will have to manage proxy pools, which entails other challenges, such as figuring out the captcha-resolving logic, maintaining the infrastructure, throttling, rotation, and managing all sessions. Of course, there is an option to use out-of-the-box proxy rotators and web crawlers to lift a part of the weight off your shoulders. But it still leaves you to handle many data issues internally. Some of those, apart from the technological side, may be legal issues.
First of all, web scraping can violate a particular website’s ToS and entail other legal risks if not done thoughtfully and ethically. Second off, certain types of proxy IP addresses (residential or mobile) used for scraping can also cause legal concerns. All these common “proxy issues” often make businesses drop the solution and look for alternative ways to obtain data.
Tiny Ranker began the development of its rank tracker in 2013. Initially, the team was scraping data from Google on its own, but as the company’s CEO Anders Pedersen put it, they were often “waking up to banned proxies.” With the growing customer base, managing proxies was only becoming more complex, and the proxy-based system turned out unreliable and inefficient. The company started looking for alternative solutions with better possibilities for growth and, in the end, chose DataForSEO SERP API as its data source.
Alternative way to get SEO data
There is a different approach that proves more cost-effective than the DIY – integrating data from a third-party provider. It will eliminate both legal issues and a great deal of time and resources you’d have to spend on managing data collection in-house.
For one thing, your data provider is liable for any scraping abuse. But this shouldn’t be an issue at all, because well-known companies value their reputation and remain respectful to the established laws.
Besides that, if you choose a reliable vendor, you will get accurate structured data ready to be integrated into the tool. A third-party data solution will reduce the time-to-market for the product and will allow your tech team to concentrate effort on strengthening other aspects of the platform. The Chief Technical Officer at RankActive, Evgeniy Vorobyov, gave us an expert opinion on this matter. The team started with developing a rank tracking tool for their platform.
With everything handled in-house, it took them about a year to bringing the rank tracker up. As Evgeniy put it, “software development is not a place to cut corners. You have to make sure that everything works smoothly, and the system is stable beforehand. Day-to-day storing of TOP30 rankings for 1,000 keywords in Google alone generates unbelievable volumes of data in a relational database, which requires dozens of servers and hundreds of hard disks.”
APIs in SEO
Getting data through API (application programming interface) is one of the most convenient ways widely used in SEO and digital marketing software development. API communicates the data from a provider to your tool. In most cases, you will also get useful and comprehensive documentation that facilitates the integration process.
Let’s check the use cases of several most popular APIs:
Google Analytics API and Google Search Console API
Allow SEMrush to provide a feature for connecting user accounts with the platform. Analyzing all the necessary data from a single interface is much more convenient than logging into multiple tools and compiling it on your own.
Google AdWords API, Bing Ads API, Facebook Marketing API (Instagram Ads API incl.)
Enable WordStream to deliver a cross-platform ad management software.
Google PageSpeed Insights API
Empowers the Screaming Frog SEO Spider with a feature to pull Chrome User Experience Report (CrUX) and Lighthouse metrics into the tool, which is very helpful for large-scale optimization.
Google My Business API
dbaPlatform wouldn’t exist without it since this marketing automation software is designed explicitly on top of the API integration to deliver software for Google My Business listings optimization.
Facebook, Twitter, Instagram, LinkedIn, and Pinterest APIs
Indispensable for creating social media management software, such as Sprout Social.
On the flip side, many of these APIs require you to go through a complicated process of applying for access (e.g., Google AdWords API) or give the possibility to obtain data only from the account you own (e.g., Google Analytics API). Even though conventional APIs can empower your software with valuable features for pay-per-click campaigns and social network management, they will not provide you with broad access to data from the SEO perspective. After all, you can’t develop any competitor research applications atop APIs that give only the data on proprietary websites.
What about more SEO data through API?
Bridging the gap, many vendors are offering SEO data through API. They scrape publicly available data from search engines and websites and supply this data in a structured format. When choosing your provider, bear in mind the anticipated scale of your software.
The majority of vendors have fixed plans with packages limiting the number of API requests you can make per month. If you wish to get more flexibility, providers offering a pay-as-you-go pricing model will be the best option. Generally, usage-based pricing is more predictable. It requires no high up-front costs (e.g., at DataForSEO, we have a minimum commitment of $50) and provides unlimited possibilities to scale your application.
DataForSEO can land you with reliable data sources for various SEO software projects:
SERP API is designed for building rank tracking or SERP analytics tools. Supported search engines include Google, Bing, Yahoo, and Yandex. Moreover, it can empower your solution with a complete overview of all Google SERP features and supply data from Google Maps, News, and Images. With SERP API, you can design a tool where users will be able to check results for a specific device, OS, and GPS coordinates or ZIP code.
Keyword Data API and DataForSEO Labs API are created to provide a robust data foundation for keyword research tools. You can embed accurate impressions, search volume, clicks, keyword popularity trends, and more data based on Google Ads and Google Trends into your tool. Advanced filtering parameters of DataForSEO Labs API can help you to deliver an efficient solution for market-specific keyword analysis and competitor research.
On-Page API is a seamless way to obtain data for developing website audit tools. On top of it, you can develop a solution that will scan a website for every known on-page factor with a custom delay between hits.
Google Reviews API is a perfect fit for crafting sentiment analysis, reputation management, and local SEO tools. It can furnish you with customer feedback data from Google, including the text of the review, submission time, owner’s response, and reviewer’s profile info.
Google Shopping API and Amazon API can help you shape an e-commerce analytics platform. Leveraging the full product and pricing data from the top online shopping services, you will be able to devise a solution for comparing and optimizing assortment, building viable pricing strategies, and analyzing competitors’ ad campaigns.
Traffic Analytics API is a power source for competitor research and market intelligence software. It can supply your tool with website traffic estimation, bounce rate, pages per visit, engagement, and traffic sources based on data from SimilarWeb.
In a nutshell, data collection is at the core of any SEO software development. Yet, there are several ways to get it, and the choice of the path is up to you. When deciding which one works better for your case, keep in mind the amount of resources you can allocate for the project. As they are often limited, many companies prefer going with off-the-shelf solutions to get the necessary data while their teams can devote more time to crafting top-of-the-line software functionality and tailoring features to the requirements of your potential customers.
For a brief rundown, we have gathered the essential differences between the two approaches to obtain SEO data in the table below.
In-house data scraping
- Complex, time- and resource-consuming, may entail legal issues, difficult to scale
SEO data API
- Affordable pricing, ready-to-use, easy integration and scaling up